Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Sep 30, 2025

PackedValuesBlockHash can be very slow with a large number of groupings. I plan to introduce a BlockHash that generates a hash for each column, then uses IntNHash to generate the final hash ID. If a multi-value block is detected, we can fall back to PackedValuesBlockHash.

@dnhatn dnhatn marked this pull request as ready for review October 1, 2025 01:23
@dnhatn dnhatn requested a review from a team as a code owner October 1, 2025 01:23
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 1, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elastic elastic deleted a comment Oct 1, 2025
for (long index = slot;; index = nextSlot(index, mask)) {
final long curId = id(index);
if (curId == -1) { // means unset
setId(index, id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be unsetting the first match that is not -1? I may misunderstood this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I copy-pasted this from other classes. I think the issue is the method name reset. This method is called after unsetting (in removeAndAdd) and is used to re-add to an empty slot instead.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea Nhat! How much do you think this will improve queries with | STATS ... BY field1.. on single valued fields?

@dnhatn
Copy link
Member Author

dnhatn commented Oct 1, 2025

How much do you think this will improve queries with | STATS ... BY field1.. on single valued fields?

It depends on the number of grouping fields and rows. However, we need another BlockHash before we can see any improvement.

@dnhatn
Copy link
Member Author

dnhatn commented Oct 1, 2025

Thanks @kkrik-es @martijnvg!

@dnhatn dnhatn merged commit dc6ac5e into elastic:main Oct 1, 2025
35 checks passed
@dnhatn dnhatn deleted the hash-n branch October 1, 2025 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants